Stability and optimality in stochastic gradient descent
نویسندگان
چکیده
Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed AISGD. For statistical efficiency, AISGD employs averaging of the iterates, which achieves the optimal Cramér-Rao bound under strong convexity, i.e., it is an optimal unbiased estimator of the true parameter value. For numerical stability, AISGD employs an implicit update at each iteration, which is related to proximal operators in optimization. In practice, AISGD achieves competitive performance with other state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates. 1 ar X iv :1 50 5. 02 41 7v 2 [ st at .M E ] 2 0 O ct 2 01 5
منابع مشابه
Towards stability and optimality in stochastic gradient descent
Lemmas 1, 2, 3 and 4, and Corollary 1, were originally derived by Toulis and Airoldi (2014). These intermediate results (and Theorem 1) provide the necessary foundation to derive Lemma 5 (only in this supplement) and Theorem 2 on the asymptotic optimality of θ̄n, which is the key result of the main paper. We fully state these intermediate results here for convenience but we point the reader to t...
متن کاملConjugate gradient neural network in prediction of clay behavior and parameters sensitivities
The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...
متن کاملA Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and ad...
متن کاملTowards Stability and Optimality in Stochastic Gradient Descent
Iterative procedures for parameter estimation based on stochastic gradient descent (sgd) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on sgd are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed a...
متن کاملIdentification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network
Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1505.02417 شماره
صفحات -
تاریخ انتشار 2015